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ABSTRACT 

'-^Several  methods  for  nonlinearly  constrained  optimisation  have  been  sug¬ 
gested  in  recent  years  that  are  based  on  solving  a  quadratic  programming  (QP) 
subproblem  to  determine  the  direction  of  search.  Even  for  dense  problems,  there 
is  no  consensus  at  present  concerning  the  "best”  formulation  of  the  QP  sub¬ 
problem.  When  solving  large  problems,  many  of  the  options  possible  for  small 
problems  become  unreasonably  expensive  in  terms  of  storage  and/or  arithmetic 
operations.  This  paper  discusses  the  inherent  difficulties  of  developing  QP-based 
methods  for  large-scale  nonlinearly  constrained  optimisation,  and  suggests  some 
possible  approaches.  < - . 
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The  problem  of  concern  is  the  following:  | 

'  I 

NCP  minimise  F(t) 

iCS* 

subject  to  As  =  I 

e(*){^}0  I 

i  <  X  <  u. 

b 

f 


The  objective  function  F(t)  is  nssumed  to  be  twice- continuously  differentiable. 
The  matrix  A  has  mi  rows;  the  vector  e(t)  contains  a  set  of  twice-continuously 
differentiable  nonlinear  constraint  functions  {c,(*)},  i=  1, . . . ,  m %. 

We  assume  that  the  number  of  variables  and  constraints  in  NCP  is  "large", 
and  that  A  is  sparse.  Obviously,  the  definition  of  "large”  depends  on  the  available 
storage  and  computation  time.  It  will  generally  be  assumed  that  the  number  of 
nonlinear  constraints  is  small  relative  to  the  number  of  linear  constraints. 

No  general  linear  inequality  constraints  have  been  included  in  the  form  NCP 
because  the  methods  to  be  discussed  are  based  on  extensions  of  the  simplex 
method  (see,  e.g.,  Dantsig,  1063).  In  solving  large  linear  programs  (LPs),  in¬ 
equality  constraints  are  converted  to  equalities  by  adding  slack  variables.  The 
purpose  of  this  transformation  is  to  allow  the  simplex  method  to  be  implemented 
with  only  column  operations  on  the  constraint  matrix.  Furthermore,  since  A  is 
stored  in  compact  form,  the  added  slack  variables  do  not  significantly  increase 
the  storage  requirements. 

There  is  still  no  universal  consensus  among  researchers  about  the  "best*  al¬ 
gorithm  for  nonlinearly  constrained  optimisation  in  the  dense  case.  However,  it 
is  generally  agreed  that  methods  based  on  a  quadratic  programming  (QP)  sub¬ 
problem  are  very  effective  (one  class  of  such  methods  will  be  briefly  summarised 
in  Section  3).  Our  concern  in  this  paper  is  with  the  general  effect  of  problem 
sise  on  the  algorithmic  procedures  associated  with  QP-based  methods,  rather 
than  with  a  complete  description  of  a  particular  algorithm.  We  shall  consider 
the  mechanics  of  the  computations  and  the  modifications  that  are  necessary  to 
perform  them  efficiently  (or  at  all). 

As  a  rule,  there  are  fewer  algorithmic  options  for  large  problems,  since 
many  computational  procedures  that  are  standard  for  small  problems  become 
unreasonably  expensive  in  terms  of  arithmetic  and/or  storage.  However,  in 
another  sense  the  options  for  large  problems  are  less  straightforward  because 
of  the  critical  effect  on  efficiency  of  special  problem  structure  and  the  details  of 
implementation. 

When  solving  large  problems,  it  may  be  necessary  to  alter  or  compromise 
what  seems  to  be  an  "ideal”  or  "natural”  strategy.  In  fact,  an  approach  that 
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would  sot  be  considered  for  small  problems  mv  turn  oat  to  be  the  best  choice  for 
some  large  problems.  For  example,  in  sohring  large  LP  problems  bj  the  simplex 
method,  it  is  often  very  expensive  to  compute  a 11  the  Lagrange  multipliers  in  or¬ 
der  to  choose  the  incoming  column  at  a  given  iteration.  With  a  “partial  pricing” 
strategy  (see,  e.g.,  Orchard-Hays,  1968),  only  some  of  the  multipliers  are  com¬ 
puted.  Although  more  iterations  mtj  be  required  to  obtain  the  optimal  solution, 
the  work  per  iteration  is  typically  lower,  and  thus  the  total  computational  effort 
may  be  decreased. 

Similarly,  certain  standard  assumptions  about  the  relative  costs  of  portions 
of  an  algorithm  become  invalid  in  the  large-scale  case.  For  example,  the  measure 
of  efficiency  of  an  algorithm  for  dense  unconstrained  optimisation  is  often  taken 
as  the  number  of  evaluations  of  user-supplied  functions  (e.g.,  the  objective  func¬ 
tion,  the  gradient)  that  are  required  to  reach  a  specified  level  of  accuracy. 
Although  this  measure  is  recognised  to  be  overly  simplistic  (see,  e.g.,  Hillstrom, 
1977;  Lyness  and  Greenwell,  1977),  it  is  nonetheless  a  reasonable  measure  of 
effectiveness  for  most  problems.  This  is  because  the  number  of  arithmetic  opera¬ 
tions  per  iteration  tends  to  be  of  order  »*  at  most,  and  the  amount  of  work 
required  for  storage  manipulation  is  negligible.  However,  even  for  unconstrained 
problems  of  moderate  sise,  the  work  associated  with  linear  algebraic  procedures 
and  data  structure  operations  tends  to  become  significant  with  respect  to  the 
function  evaluations  (see,  e.g.,  the  timing  results  obtained  by  Thapa,  1980). 

The  following  assumptions  and  notation  will  be  used  throughout  the  paper. 
A  local  minimum  of  NCP  will  be  denoted  by  z*.  The  gradient  of  F(x)  will  be 
denoted  by  f(z),  and  its  Hessian  matrix  by  G[z).  The  gradient  of  the  »-th 
nonlinear  constraint  function  c <(z)  will  be  denoted  by  a<(z),  and  its  Hessian  by 
G<(z).  The  first-order  Kuhn- Tucker  conditions  (see,  e.g.,  Fiacco  and  McCormick, 
1968)  will  be  assumed  to  hold  at  z*  so  that  there  exists  a  Lagrange  multiplier 
vector  X*  corresponding  to  the  active  constraints. 

In  an  iterative  method  for  computing  z*,  the  (*  -f-  l)-th  iterate  is  defined  as 


*s+i  =  **  +  °kPk, 


where  ps  is  the  search  direction  and  the  positive  scalar  as  is  the  step  length. 
Usually,  Pm  is  chosen  to  be  a  descent  direction  with  respect  to  some  merit 
function,  end  as  is  chosen  to  produce  a  ‘sufficient  decrease”  in  the  merit  function 
(see  Ortega  and  Rheinboldt,  1970,  for  a  definition  of  ‘sufficient  decrease”). 


t.  Large  teate  Lias  arty  Constrained  Optimisation 


In  this  section,  we  briefly  review  the  key  features  of  an  efficient  method  for 
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large-scale  linearly  constrained  optimisation.  The  problem  format  it  given  by 

minimiie  F(*) 

•object  to  Ax  =  t  t1) 

Ks<v. 

The  algorithm  for  (1)  to  be  described  it  the  reduced-gradient  algorithm 
(Wolfe,  1962)  of  Murtagh  and  Saunders  (1978),  which  has  been  implemented 
in  the  Fortran  program  MINOS  (Murtagh  and  Saunders,  1977).  An  "active  set" 
strategy  is  used  to  compute  the  search  direction;  this  means  that  at  each  iteration 
a  certain  subset  of  the  bounds  are  treated  as  equalities,  and  the  corresponding 
variables  are  held  fixed  at  these  bounds  during  the  iteration.  Let  A  denote  the 
matrix  of  coefficients  corresponding  to  the  active  constraints;  A  will  contain  all 
the  general  constraints  plus  the  active  bounds.  In  order  for  the  same  constraints 
to  be  active  at  the  next  iterate,  the  search  direction  must  satisfy 

Ap  =  0.  (2) 

Let  2  denote  a  matrix  whose  columns  form  a  basis  for  the  null  space  of  A,  so 
that  Az  =  0.  The  relationship  (2)  implies  that  p  must  be  a  linear  combination 
of  the  columns  of  2,  i.e., 

p  =  Zpx  (8) 

for  some  pM. 

In  order  to  define  a  direction  that  satisfies  (2)  when  A  is  large  and  sparse, 
the  matrix  A  is  (conceptually)  partitioned  as  follows: 

A  =  (tf  S  N).  (4) 

The  matrix  B  (for  "basis”,  by  analogy  with  linear  programming)  is  square  and 
non-singular,  and  its  columns  correspond  to  the  basic  variables.  The  columns  of 
N  correspond  to  the  Doabuic  variables  (those  to  be  fixed  on  their  bounds).  The 
columns  of  the  matrix  S  correspond  to  the  remaining  variables,  which  are  termed 
mperbuic.  Note  that  the  number  of  columns  in  B  is  fixed,  but  the  numbers  of 
columns  in  S  and  N  may  vary.  We  emphasise  that  only  column  operations  are 
performed  on  B  as  the  algorithm  proceeds. 

At  a  given  iteration,  the  active  constraints  are  given  by 

c :  7)0-a> 

where  the  components  of  i*  are  taken  from  either  i  or  u,  depending  on  whether 
the  lower  or  upper  bound  is  active. 
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The  matrix  /  uiad  hare  can  be  represented  as 

-rn 

Naturally,  1  and  Z  are  not  computed  explicitly.  Rather,  a  sparse  LU  fac¬ 
torisation  of  B  is  maintained;  periodic  refaetorisation  (often  termed  "reinversion") 
is  used  to  condense  storage  and  regain  accuracy  in  the  factors  (see  Saunders, 
1976;  Reid,  1976). 

The  form  (6)  of  Z  means  that  the  partitioning  of  variables  into  basic,  non- 
basic,  and  superbasic  sets  carries  over  to  the  calculation  of  the  search  direction, 
p.  If  p  is  partitioned  as  (p*  pt  p„),  from  (3)  and  (6)  we  see  that  pH  =  0  and 


i  * 

(6) 


Bpa  =  ~Sp0.  (7) 

Equation  (7)  shows  that  pB  can  be  computed  in  terms  of  pt,  and  thus  the 
superbasic  -variables  act  as  the  "dri-ring  force*  in  the  minimisation.  To  determine 
the  rector  p9,  a  quadratic  approximation  to  the  objectiee  function  is  minimised 
subject  to  the  constraint  (2);  the  search  direction  therefore  "solres*  an  equality- 
constrained  quadratic  program  of  the  form 

miwmin  \?THp  +  §Tp  (8a) 

subject  to  Ap  =  0,  (8b) 

where  H  is  an  approximation  to  G{t),  the  Hessian  matrix  of  F (*).  The  solution  of 
(8)  can  be  obtained  using  only  the  projected  matrix  ZTHZ  (so  that  H  itself  is  not 
required).  In  the  Murtagh-Saunders  algorithm,  a  quasi- Newton  approximation  to 
Z*G(x)Z  is  maintained  in  the  factorised  form  RTR,  where  JR  is  upper  triangular, 
and  pt  is  computed  from 

RTRf.  =  -ZT,  =  -StB~tu  +  («) 

After  pt  and  pa  have  been  computed  from  (9)  and  (7),  the  ralue  of  the  step 
length  is  chosen  to  achieve  a  suitable  reduction  in  F. 

As  long  as  ||£rf||  i*  "large* ,  only  the  basic  and  superbasic  variables  are 
optimised.  If  one  of  these  variables  encounters  a  bound  as  the  iterations  proceed, 
it  is  moved  into  the  set  of  nonbask  variables,  and  the  set  of  active  constraints  is 
altered  aeordingly. 

When  N*rf||  is  "small*,  it  is  considered  that  the  current  iterate  is  "nearly* 
optimal  on  the  current  set  of  active  constraints.  In  this  situation,  we  determine 
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whether  the  objective  function  can  be  farther  reduced  by  releasing  any  nonbasie 
variable  from  its  bound.  This  possibility  is  checked  by  computing  Lagrange 
multiplier  estimates  from  the  system 


We  define  the  vectors  w  and  o  from 

Btt  =  h, 

o  =  Ik  ~  NT*. 


(10) 


(11) 

(12) 


The  system  (10)  is  compatible  when  ZTg  =  0,  since  in  this  case 


g ,  —  StB  Tga  =  ST k. 


The  vector  o  thus  provides  a  set  of  Lagrange  multipliers  for  the  bound  constraints 
that  are  active  on  nonbasie  variables.  If  a  nonbasie  variable  can  be  released  from 
its  bound,  the  iterations  continue  with  an  expanded  superbasic  set. 

The  procedures  of  this  method  diffsr  in  several  ways  from  those  used  in 
the  dense  case.  Firstly,  the  null  space  of  A  is  defined  in  terms  of  a  partition 
of  the  variables,  rather  than  a  matrix  Z  with  orthogonal  columns  (see  Gill 
and  Murray,  1974).  The  expression  (6)  for  Z  indicates  that  an  ill-conditioned 
basis  matrix  B  can  affect  the  condition  of  all  calculations  in  the  algorithm,  and 
may  drastically  alter  the  scaling  of  the  variables.  When  the  columns  of  Z  are 
orthogonal,  ||Zrg||a  <  ||f||a;  otherwise,  ZTg  is  “unsealed*.  Since  an  orthogonal 
Z  is  not  practical  (in  terms  of  storage  or  computation)  for  most  large-scale 
problems,  additional  numerical  difficulties  are  likely  in  the  computation  of  p  and 
the  projected  Hessian  approximation. 

Secondly,  the  multiplier  estimates  computed  from  (10)  are  exact  only  when 
ZTg  =  0,  and  the  neighborhood  in  which  their  sign  is  correct  depends  on  the 
condition  of  B.  Hence,  when  ||Zr9||  is  merely  "small*,  it  may  be  inefficient 
to  release  a  variable  based  on  the  vector  o  from  (12).  Although  a  feasible 
descent  direction  can  be  computed,  the  deleted  constraint  may  very  soon  be  re¬ 
encountered.  This  difficulty  is  less  severe  in  the  dense  case,  where  the  multiplier 
estimates  computed  with  an  orthogonal  Z  will  in  general  have  the  correct  sign 
in  a  much  larger  neighborhood  of  a  constrained  stationary  point  because  the  sise 
of  the  neighborhood  depends  only  on  the  condition  of  A  (see  Gill  and  Murray, 
1979a,  for  further  discussion).  The  increased  unreliability  of  Lagrange  multiplier 
estimates  is  unavoidable  when  Z  is  given  by  (6),  and  must  be  taken  into  account 
in  all  large-scale  optimisation. 
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Finally,  the  cost  of  computing  and  updating  the  factorisation  of  B  is  sub¬ 
stantial,  in  terms  of  both  arithmetic  operations  and  storage  manipulation.  For 
many  large-scale  problems,  the  work  associated  with  performing  the  steps  of  the 
algorithm  completely  dominates  the  cost  of  evaluating  the  nonlinear  objective 
function. 


8.  QP-Bstsd  Methods  fbr  Dense  Problems 

This  section  will  be  concerned  entirely  with  the  treatment  of  nonh'near  con¬ 
straints.  If  the  problem  NCP  contains  only  nonlinear  constraints,  the  assumed 
optimality  conditions  imply  that  x*  is  a  stationary  point  of  the  Lzgrzngizn  func¬ 
tion 

L{z,\)  =  F(z)-\Tt(z), 

when  X  =  X*  (where  l(x)  denotes  the  set  of  constraints  that  hold  with  equality 
at  x*). 

In  a  QP-based  method,  the  search  direction  is  the  solution  of  a  QP  sub¬ 
problem 


minimise 

rest* 

subject  to 


iprtfp  +  g5> 

*{>}* 


(13®) 

(13b) 


The  quadratic  objective  function  (13a)  of  the  QP  subproblem  is  often  viewed 
as  a  quadratic  approximation  to  the  Lagrangfan  function,  in  which  case  gL  and 
H  represent  the  gradient  and  Hessian  of  the  Lagrangian  function,  respectively. 
However,  the  vector  §t  is  usually  taken  as  g(x*);  this  choice  does  not  alter  the 
solution  of  some  QP  subproblems,  and  has  the  benefit  that  the  multipliers  of 
the  subproblem  may  then  be  used  as  estimates  of  the  multipliers  of  NCP.  The 
linear  constraints  (13 b)  of  the  subproblem  are  based  on  linearisations  of  the 
nonlinear  constraints  at  the  current  point,  and  thus  A  in  (131)  usually  includes 
the  constraint  gradients  (a<(s*)}. 

QP-based  methods  have  been  proposed  by  many,  including  Wilson  (1963), 
Murray  (1969),  Biggs  (1972),  Garcia  and  Mangasarian  (1976),  Han  (1976, 1977), 
Wright  (1976),  and  Powell  (1977,  1978).  Although  it  can  be  shown  that  under 
certain  conditions,  QP-based  methods  are  equivalent  to  other  methods  (Tapia, 
1978),  ws  shall  consider  only  methods  in  which  a  QP  subproblem  is  actually 
strived  to  obtain  the  search  direction. 

There  we  substantial  variations  in  formulation  of  the  QP  subproblem  (13). 
Certain  crucial  issues  remain  unresolved:  representation  of  If  in  (13a);  speci¬ 
fication  of  d  in  (131);  treatment  of  an  infeasible  or  unbounded  subproblem; 
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recovery  from  ill-conditioning;  computation  of  reliable  Lagrange  multiplier  es¬ 
timates;  definition  of  a  merit  function  to  be  used  in  defining  the  step  length  a*; 
and  maintenance  (if  possible)  of  super  linear  convergence.  See  Maratos  (1978), 
Chamberlain  (1979),  Murray  and  Wright  (1980)  and  Chamberlain  et  aJ.  (1980) 
for  discussions  of  some  of  these  issues. 

We  shall  mainly  discuss  a  strategy  based  on  formulating  the  QP  subproblem 
(13)  with  only  equality  constraints.  This  will  be  termed  the  equality  QP  (EQP) 
approach,  and  the  subproblem  is  given  by 


minimise 
subject  to 


\pTfIP  +  9TP 

Ap  =  d. 


(14a) 

(148) 


The  t  rows  of  the  matrix  A  represent  a  selection  of  constraints  that  are  considered 
to  be  "active”.  (We  shall  not  be  concerned  with  how  these  constraints  are 
selected.)  The  constraints  (146)  are  assumed  to  be  compatible. 

It  is  convenient  both  conceptually  and  computationally  to  write  the  solution 
of  (14)  as  the  sum  of  two  orthogonal  vectors.  Let  Y  denote  a  matrix  whose 
columns  form  a  basis  for  the  range  space  of  AT\  as  in  Section  2,  Z  will  denote  a 
matrix  whose  columns  form  a  basis  for  the  null  apace  of  A.  The  solution  of  (14) 
can  be  written  as 

p*  =  Yfr  +  Zp.  (15) 

The  vector  pY  is  the  solution  of  the  linear  system 

Arpr  =  l  (is) 


When  the  constraints  (146)  are  consistent,  the  system  (16)  is  compatible;  when 
A  has  full  row  rank,  the  vector  pr  is  unique,  since  AY  is  a  non-singular  matrix. 
An  important  difference  from  the  linear-constraint  case  is  that  in  general  pr 
is  non-acro  (since  d  is  non-sero).  Hence,  we  must  take  account  of  the  range 
space  component  as  well  as  the  null  space  component  in  computing  the  search 
direction. 

The  vector  pK  is  the  solution  of  the  linear  system 

Z tH  ZpK  =  — Z*f  —  ZTHYpr .  (17) 

When  ZrHZ  is  indefinite,  the  interpretation  of  (17)  is  ambiguous;  see  Murray 
and  Wnght  (1980)  for  further  discussion  of  this  point. 

When  A  is  small,  suitable  matrices  Y  and  Z  with  orthonormal  columns  can 
be  obtained  from  the  LQ  factorisation  of  A ,  since  we  have 

AQ  =  A(Y  Z)  =  (L  0). 


QP-B&Md  iirtkodt  for  Lais*-8esJs  OptbnJnttoa 


5» 


The  (n  —  t)  X  (»  —  t)  matrix  ZTHZ  needed  to  compute  pg  can  be  formed 
in  various  ways.  If  second  derivatives  are  available,  H  can  be  taken  as  W,  the 
current  approximation  of  the  Hessian  of  the  Lagrangian  function 


« 


w  =  G(«,)  -  £  X(G*{*k), 
<— 1 


where  {X<}  are  the  current  Lagrange  multiplier  estimates.  If  first  derivatives 
are  available,  the  matrix  WZ  can  be  approximated  by  finite-differences  of  the 
gradient  of  the  Lagrangian  function  along  the  columns  of  Z.  Note  that  the 
matrix  HZ  (rather  than  H  itself)  is  required  to  solve  (17);  thus,  substantial 
efficiencies  are  possible  with  a  discrete  Newton  method,  since  only  n—t  gradient 
evaluations  are  required  to  approximate  WZ,  compared  to  the  possible  n  evalua¬ 
tions  needed  to  approximate  the  full  matrix  W.  A  quasi-Newton  approximation 
to  W  or  ZTWZ  may  also  be  recurred. 


4.  The  Use  of  a  Linearly  Constrained  StibproMent 

Given  the  sophisticated  techniques  available  for  large-scale  linearly  constrained 
optimisation  (see  Section  2),  it  is  logical  to  attempt  to  apply  them  to  the  non- 
linearly  constrained  problem  NCP.  One  possible  way  to  do  so  is  to  pose  a  sequence 
of  linearly  constrained  subproblems  with  a  general  (rather  than  quadratic)  objec¬ 
tive  function.  Such  a  method  was  proposed  by  Robinson  (1972)  and  Rosen  and 
Kreuser  (1972),  and  more  recently  by  several  others  (e.g.,  Van  der  Hoek,  1979). 
The  specific  application  of  this  idea  to  large-scale  problems  using  the  algorithm 
described  in  Section  2  has  been  suggested  by  Rosen  (1978)  and  Murtagh  and 
Saunders  (1980a, b). 

In  order  to  give  the  flavor  of  this  approach,  we  shall  briefly  describe  the 
algorithm  of  Murtagh  and  Saunders.  Let  z*  and  X*  denote  the  current  iterate 
and  the  current  estimate  of  the  Lagrange  multipliers;  other  quantities  subscripted 
by  k  will  denote  those  quantities  evaluated  at  z*.  The  next  iterate  is  obtained 
by  solving  the  linearly  constrained  subproblem 


minimise  F(z)  —  X*  re(x)  +  iyC(z)T«(z) 
subject  to  Ax  =  & 

-<i*(*-**)  j>} — «* 

t  <  X  <u. 


(18) 


The  constraints  of  (18)  that  are  involved  in  A*  are  obtained  by  linearising  all 
the  nonlinear  constraints.  The  function  e(z)  is  defined  as  the  original  nonlinear 


$4  Urn  of « Ltaoorfy  Coaotniaod  8abprobkm  0 

function  minus  its  current  linearisation: 

S(x)  =  c(x)  —  cu  —  A*(*  —  **). 

Note  that  any  iterative  procedure  for  solving  (18)  requires  evaluation  of  the 
problem  functions  (in  contrast  to  solving  a  QP  subproblem,  where  all  the  work 
is  linear-algebraic). 

The  nonlinear  objective  function  of  the  subproblem  (18)  is  called  a  modified 
augmented  Lagrangian  function.  The  penalty  term  JpZ(x)rJ(z)  is  included  to  en¬ 
courage  progress  from  a  poor  starting  point.  When  z*  is  judged  to  be  sufficiently 
close  to  x,  the  penalty  parameter  p  is  set  to  sero  in  order  to  achieve  the  quadratic 
convergence  proved  by  Robinson  (1972,  1974). 

Some  aspects  of  this  approach  to  solving  NCP  are  relevant  to  our  later 
discussion  of  QP-based  methods.  Other  aspects  illustrate  the  compromises  that 
are  often  necessary  in  solving  large-scale  problems. 

For  dense  problems,  methods  based  on  linearly  constrained  subproblems 
have  generally  been  regarded  as  less  efficient  than  QP-based  methods,  in  terms 
of  the  number  of  function  evaluations  required  for  convergence  (see,  e.g.,  the 
comments  in  Murtagh  and  Saunders,  1980a).  For  certain  problem  categories, 
solving  a  more  difficult  subproblem  sometimes  leads  to  an  improvement  in  overall 
efficiency.  However,  the  additional  work  necessary  to  solve  (18)  as  compared  to  a 
QP  does  not  appear  to  produce  a  comparable  increase  in  efficiency  for  problems 
in  which  the  overhead  associated  with  performing  linear  algebraic  procedures  is 
small  relative  to  the  cost  of  evaluating  the  nonlinear  functions.  Whether  the 
tradeoff  will  be  different  in  the  sparse  case  is  still  unknown. 

A  method  based  on  (18)  generates  the  next  "outer”  iterate  through  a  sub¬ 
problem  whose  solution  also  requires  an  iterative  procedure  (which  generates 
"inner”  iterates).  In  the  MINOS/AUGMENTED  implementation  of  this  method 
(Murtagh  and  Saunders,  1980b),  a  limit  is  imposed  on  the  number  of  inner  itera¬ 
tions.  If  the  maximum  number  of  such  iterations  is  reached,  the  inner  procedure 
is  terminated,  and  its  final  iterate  is  taken  as  the  next  outer  iterate.  It  seems 
essential  to  impose  such  a  limit  in  the  large-scale  case,  since  it  is  unlikely  that  the 
initial  Jacobian  approximation  and  multiplier  estimates  will  remain  appropriate 
if  hundreds  of  iterations  are  required  to  reach  optimality  for  the  subproblem. 
The  effects  of  such  premature  termination  remain  to  be  analysed. 

A  "good*  choice  for  the  penalty  parameter  p  is  crucial  in  the  success  and 
efficiency  of  the  method  on  certain  problems.  The  considerations  in  selecting 
p  are  similar  to  those  in  an  augmented  Lagrangian  method  (for  a  survey,  see 
Fletcher  1974).  A  too-small  value  of  p  may  lead  to  excessive  constraint  violations 
in  the  solution  of  (18),  an  unbounded  subproblem  (18),  or  a  poorly  conditioned 
Hessian  of  the  augmented  function.  A  too-large  value  of  p  may  also  cause  the 
Hessian  to  be  ill  conditioned;  it  may  have  the  additional  undesirable  effect  of 
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forcing  the  iterates  to  follow  the  constraint  boundary  very  closely.  Furthermore, 
the  decision  as  to  when  to  set  p  to  sero  is  not  straightforward. 

The  value  of  X*  in  (18)  is  taken  as  the  multiplier  vector  of  the  previous  sub¬ 
problem.  If  the  previous  subproblem  was  solved  to  optimality,  this  ensures  that 
the  multipliers  corresponding  to  inequality  constraints  have  the  correct  sign,  and 
that  X,  =  0  for  inactive  constraints  regardless  of  the  partition  (B  S).  However, 
it  means  that  multiplier  estimates  are  not  computed  with  the  most  recent  infor¬ 
mation,  but  rather  are  based  on  the  “old"  Jacobian.  In  addition,  the  interpretar 
tion  of  the  available  Lagrange  multiplier  estimates  is  further  complicated  if  an 
inner  iteration  is  terminated  before  convergence. 


8.  Extension  of  QP- Bated  Methods  to  the  Large-Scale  Case 

In  the  remainder  of  tins  paper,  we  shall  consider  some  of  the  issues  in  developing 
a  QP-based  method  such  as  those  described  in  Section  3  for  the  problem  NCP.  It 
is  assumed  that  the  sparsity  pattern  of  the  Jacobian  of  the  nonlinear  constraints 
is  known  a  priori. 

Even  for  dense  problems,  linear  constraints  (especially  bounds)  should  be 
treated  separately  from  nonlinear  constraints.  If  they  are  not,  considerable 
inefficiencies  tend  to  be  introduced  into  solution  methods;  furthermore,  the 
iterates  will  not  in  general  be  feasible  with  respect  to  the  linear  constraints.  We 
believe  that  QP-based  methods  should  treat  any  linear  constraints  as  if  they  were 
the  only  constraints  in  the  problem,  in  order  to  take  advantage  of  the  efficiencies 
associated  with  purely  linear  constraints,  and  to  ensure  that  the  nonlinear  func¬ 
tions  are  evaluated  only  at  points  that  satisfy  the  linear  constraints. 

When  this  approach  is  taken  in  an  EQP  method,  the  matrix  A  in  (146)  will 
include  the  general  linear  constraints  and  active  bounds  as  well  as  the  current 
gradients  of  the  nonlinear  constraints.  An  EQP  method  should  therefore  include 
a  strategy  to  exploit  the  fact  that  only  part  of  A  changes  from  one  iteration  to 
the  next,  since  the  rows  of  A  that  correspond  to  linear  constraints  and  simple 
bounds  remain  constant. 

In  the  dense  version  of  the  EQP  method  discussed  in  Section  3,  the  search 
direction  was  represented  (and  computed)  in  terms  of  matrices  Y  and  Z  obtained 
from  the  LQ  factorisation  of  A.  A  similar  representation  of  the  solution  of  (14) 
must  be  developed  that  is  suitable  for  large-scale  problems.  In  this  section,  we 
sketch  one  possible  approach,  which  is  similar  to  that  described  in  Section  2  for 
the  large-scale  linear-constraint  case. 

The  variables  are  partitioned  into  basic,  superbasic,  and  nonbasic  sets,  with 
a  corresponding  partition  of  the  columns  of  A  and  the  components  of  p  and  f. 
Since  the  nonbasic  variables  are  fixed  on  their  bounds  during  a  given  iteration, 
the  vector  pH  must  be  sero  (and  can  be  ignored).  To  satisfy  the  linear  constraints 


1 


I* 

(146),  it  moat  hold  that 
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(*  *>(£)  =  *  («) 

where  B  it  a  t  X  t  square  non- singular  matrix,  and  d  contains  appropriate 
elements  of  <2.  From  (19)  it  follows  that 


Bpa  =  d—Sp,. 


(20) 


Note  that  the  components  of  d  will  be  sero  in  positions  corresponding  to  linear 
constraints.  Hence,  for  anj  pMl  the  definition  of  pa  bj  (20)  ensures  that  p  will 
be  feasible  with  respect  to  the  linear  constraints  of  both  Idle  subproblem  and  the 
original  problem. 

The  -vector  pt  is  determined  by  minimisation  of  the  quadratic  objective 
function  (14a).  Writing  this  objective  in  terms  of  the  partitioned  vector  p,  we 
obtain 

\pTJ*  aPa  +  Pl^aaP,  +  ^pJ^.P.  (21) 

+  APa  +  tlPa» 


where 


Hb  Haa 

Hi,  Ha 


) 


Substituting  for  pa  using  (20)  makes  (21)  a  quadratic  function  in  p,  alone. 
The  optimal  p«  is  the  solution  of  a  system  of  equations  exactly  analogous  to  (17) 
for  the  dense  case: 


ZrHZpa  = 


-zT,+zTH(Boiiy 


<») 


where  Z  is  given  by  (6). 

At  a  typical  iteration,  B  is  given  by 

If  we  assume  that  the  linear  constraints  are  placed  first,  the  first  ti  rows  (the 
matrices  B\  and  B*)  correspond  to  the  linear  constraints,  and  the  last  t%  rows 
(the  matrices  Hs  and  £«)  correspond  to  the  nonlinear  constraints.  Both  B\  and 
#4  are  square. 

With  the  EQP  approach  described  in  Section  3,  the  matrix  A  in  (14)  includes 
only  the  gradients  of  the  active  nonlinear  constraints.  The  question  therefore 
arises  of  how  to  treat  nonlinear  inequalities  in  large-scale  problems.  The  reasons 
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noted  in  Section  1  for  adding  slack  'variables  to  linear  inequality  constraints 
also  apply  to  nonlinear  inequality  constraints.  However,  there  might  appear  to 
be  some  disadvantages.  In  particular,  keeping  all  nonlinear  constraints  as  rows 
of  B  would  seem  to  result  in  increased  housekeeping  costs,  as  well  as  wasted 
computational  effort  in  computing  the  gradients  of  inactive  constraints.  In  fact, 
neither  of  these  disadvantages  applies,  and  the  slack  variables  corresponding  to 
nonlinear  inequalities  can  be  included  with  very  little  cost.  The  elements  of  the 
search  direction  corresponding  to  the  slack  variables  of  the  nonlinear  constraints 
can  be  ignored,  and  the  line  search  involves  only  the  original  variables.  The 
value  of  a  nonlinear  slack  variable  at  the  next  iterate  is  given  by  the  recomputed 
constraint  value,  which  is  used  to  determine  whether  the  slack  variable  is  basic. 
All  the  other  coefficients  in  the  row  of  the  Jacobian  associated  with  a  basic  slack 
variable  can  be  set  to  sero,  and  there  is  no  need  to  compute  the  gradient  of  the 
corresponding  constraint. 


I.  Representing  the  Basis  Inverse 

In  this  section,  we  consider  methods  for  representing  B~ 1  as  the  iterations  of 
an  EQP  method  proceed.  The  inverse  is  never  represented  explicitly.  However, 
we  use  this  terminology  because  the  methods  to  be  described  solve  the  linear 
systems  that  involve  B  without  a  complete  factorisation  of  B. 

Changes  in  the  columns  of  B  that  result  as  variables  move  on  and  off  bounds 
can  be  carried  out  exactly  as  in  the  linear-constraint  case.  The  difficulty  in 
a  nonlinearly  constrained  problem  is  that  the  last  t2  rows  of  B  will  change 
at  each  iteration  due  to  constraint  nonlinearities.  We  assume  that  it  is  not 
computationally  feasible  to  refaetorise  B  at  every  iteration;  however,  periodic 
refactorisation  will  be  performed  to  condense  storage  and  ensure  accuracy  in  the 
factors. 

If  both  t2  and  the  number  of  non-sero  elements  in  the  last  fa  rows  of  B  are 
small,  the  changes  in  B  due  to  constraint  nonlinearities  represent  only  a  small 
number  of  column  changes.  In  this  case,  it  would  be  practical  to  update  the  LU 
factors  of  B  in  a  standard  fashion  (see,  e.g.,  Forrest  and  Tomlin,  1972;  Reid, 
1976;  Saunders,  1976).  However,  each  iteration  would  involve  several  column 
updates,  and  hence  refactorisation  would  be  required  at  more  frequent  intervals. 


6.1.  Partitioning.  Since  B\  includes  only  linear  constraints,  it  is  possible  to 
recur  a  factorisation  of  B\  from  iteration  to  iteration.  This  fact  can  be  utilised 
to  advantage  because  systems  of  equations  involving  B  or  BT  can  be  solved  using 
factorisations  of  Bt  and  a  matrix  the  sise  of  B«. 


Am  Appnxtmit*  bww 
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For  example,  if  the  rector  ft  is  partitioned  corresponding  to  (23)  as  (ftt  fta), 
the  solution  of  Bx  =  ft  can  be  represented  as 

-CD 

where  the  rectors  uj,  u?  and  Vi  are  calculated  from 

fli«i  =  fti, 

IH»i  =  fta  —  BjUi,  (24) 

Bi«a  =  — Ba«i, 

where 

D  =  fl4-  BsB^lBr  (25) 

This  procedure  is  sometimes  described  as  a  partitioned  inrerse  technique 
(see,  e.g.,  Faddeera,  1959).  The  matrix  (25)  is  called  the  Schur  complement  (see, 
e.g.,  Cottle,  1974).  The  steps  of  (24)  are  equiralent  to  block  Gaussian  elimination 
on  B,  with  B\  as  the  first  block. 

If  tj  >  ta,  the  main  work  in  (24)  inrolres  obtaining  Bf'Ba  (or  Bf  rBf) 
in  (25).  To  reduce  the  work  in  this  calculation,  it  is  helpful  to  maximise  the 
number  of  sero  columns  of  Ba  and/or  Bj.  This  can  be  borne  in  mind  whenerer 
B\  is  refactorised,  since  there  are  some  degrees  of  freedom  in  deciding  which 
rariables  are  to  be  basic  and  superbasic.  Once  the  LU  factors  of  B\  are  available, 
the  matrices  needed  to  compute  D  can  be  obtained  by  forming  L~xBa  and 
U~TBl  In  the  large-scale  case,  however,  it  will  usually  be  more  efficient  to 
compute  U~lL~lBa  or  L~TU~TBl,  depending  on  whether  L  and  U  are  stored 
by  columns  or  rows. 

Although  Bi  is  required  to  be  a  fixed  sise  with  this  approach,  the  number  of 
active  nonlinear  constraints  may  vary.  Therefore,  it  is  not  necessary  to  include 
slack  variables  for  the  nonlinear  inequality  constraints. 


it  An  Approximate  Inverse;  Iterative  Improvement.  An  obvious  strategy  for 
overcoming  the  difficulties  of  updating  B”1  as  its  last  rows  change  is  simply 
not  to  update  it.  The  technique  of  retaining  a  constant  Jacobian  or  Hessian 
approximation  in  Newton-type  methods  is  widely  used  (see,  e.g.,  Dennis,  1971), 
and  has  been  thoroughly  analysed.  With  the  approach  described  in  Section  4, 
the  linear  constraints  remain  constant  until  the  general  subproblem  (18)  has  been 
solved. 

This  idea  and  its  extensions  can  be  applied  to  a  QP- based  method  in  several 
ways.  Let  B~l  denote  an  available  representation  of  an  approximation  to  the 
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iimrw  of  B  (e.g.,  from  the  mock  recent  factorisation  or  some  previous  iteration). 
Wi  shall  mention  two  possible  strategies  for  using  B~l  to  "solve*  systems  of 
equations  such  as  £x  =  i.  Firstly,  we  can  simply  solve  the  system  using  £~l; 
in  effect,  this  involves  substituting  B  for  B  during  some  number  of  consecutive 
iterations.  Secondly,  B~l  could  be  used  further  in  an  iterative  improvement 
procedure  (see,  e.g.,  Wilkinson,  1965),  assuming  that  B  is  also  available. 

Such  approximations  are  acceptable  in  QP-based  methods  because  the  linear 
constraints  of  the  QP  subproblem  are  typically  derived  from  optimality  condi¬ 
tions,  and  the  precise  specification  of  the  linear  constraints  is  critical  only  near 
the  solution.  Consequently,  there  is  substantial  freedom  to  define  the  constraints 
(141)  when  *k  is  not  close  to  sf,  provided  that  a  sufficient  decrease  in  the  merit 
function  can  be  guaranteed. 

When  B  is  the  basis  matrix  from  a  previous  iteration,  the  error  in  the 
approximate  inverse  is  of  a  special  form  because  the  first  ti  rows  of  B  are 
constant.  In  general,  B  satisfies 

B=B+F=B+f ° V 

where  the  matrix  A  represents  the  change  in  gradients  of  the  nonlinear  con¬ 
straints.  Therefore,  we  have 

Because  of  the  relationship  between  B  and  B,  the  structure  of  the  error  in 
the  approximate  inverse  is  such  that  the  equations  (141)  corresponding  to  linear 
constraints  are  always  satisfied  •exactly",  even  if  £  is  used  rather  than  B.  In 
general,  pa  should  satisfy 

Bp,  =  d-Sp,. 

If  Pa  is  defined  instead  from 

Bfm  =  d  —  Spa, 

then  it  follows  that 

where  d>  and  4s  denote  the  first  ti  and  last  tg  components  of  d,  respectively. 
Thus,  when  pa  Is  used  instead  of  pa,  the  equalities  of  the  QP  subprobtem 
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corresponding  to  tho  Jfnear  constraints  remain  satisfied  (with  exact  arithmetic), 
regardless  of  ||Afl. 

It  is  also  indnethv  to  considor  tbs  siso  of  tha  arror  that  arisas  from  using 
B  instaad  of  B.  Lat  a  ha  tha  exact  solution  of  Ss  =  ft,  and  1st  h  ha  tho  rector 
such  that  B{s  +  h)  —  ft.  Ass nminf  that  (|£~'1f,||  <  1,  it  can  be  shown  that 

m  ^  *m 

m  -  mv-HB-'F\\y 

whore  k  is  tha  condition  nnmbor  of  B.  Thus,  whan  ||A||  is  small  and  B  is  not 
too  ill-conditioned,  tha  relatire  error  in  a  is  bounded.  (Note  also  that  tha  bound 
is  independent  of  ||ft|).  This  is  important  because  tha  right-hand  side  of  (14ft) 
approaches  sero  as  the  iterates  converge,  and  it  would  be  unacceptable  for  the 
bound  on  the  relative  error  in  the  computed  solution  to  increase.) 

If  we  hare  the  exact  triangular  factors  L  and  U  of  B  and  can  apply  B  to 
form  the  residual  rector,  then  (with  exact  arithmetic)  an  iterative  improvement 
procedure  for  solving  Bs  =  t  will  converge  if 

i/-s-*bi<i. 

With  this  approach,  B  must  remain  a  fixed  sise,  so  that  slack  variables  must 
be  included  for  the  nonlinear  inequality  constraints  (in  contrast  to  the  method  of 
Section  6.1,  where  only  the  inequalities  currently  considered  active  were  included 
inS). 


7.  The  taarah  DbeeMoa  for  the  teparhaik  Variables 

Given  that  we  can  obtain  a  representation  of  B~l  (and  hence  of  Z),  a  second 
issue  in  implementing  a  QP-based  method  for  large-scale  problems  is  how  to  solve 
the  equations  (22)  for  p9.  The  difficulty  is  that  the  storage  and  computation 
associated  with  forming  ZTHZ  (or  ZTH)  may  be  prohibitive.  Since  H  is  n  x  n, 
there  will  in  general  be  inadequate  storage  to  retain  a  full  version  of  H. 

In  many  cases,  the  dimension  of  the  projected  Hessian  matrix  ZTHZ  will 
be  relatively  small  at  every  iteration,  even  when  the  problem  dimension  is  large. 
If  ZtHZ  is  small  enough  to  be  stored,  standard  approaches  from  the  dense  case 
may  be  used.  For  example,  a  quasi- Newton  approximation  of  tha  projected 
Hessian  of  the  Lagrangian  function  may  be  maintained  using  update  procedures 
similar  to  those  in  the  linear-constraint  case.  Any  questions  concerning  such 
procedures  apply  generally  to  nonlinear ly  constrained  optimisation,  and  are  not 
particular  to  large-scale  problems.  However,  the  technique  of  computing  finite- 
difbrances  along  tha  columns  of  1,  which  is  very  successful  for  small  problems, 
is  too  expensive  in  the  large-scale  case  because  of  tha  sflbrt  required  to  form 
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2.  Furthermore,  mn  if  W  itMlf  it  available,  it  is  probably  too  costly  to  form 

Fortunately,  another  alternative  is  available  when  limitations  of  storage 
and/or  computation  preclude  an  explicit  representation  of  2rH  2.  Although  vie 
shall  discuss  the  method  in  the  context  of  noniinearly  constrained  optimisation, 
it  is  equally  applicable  to  large-scale  linearly  constrained  optimisation  (e.g.,  in 
the  method  discussed  in  Section  2).  Furthermore,  it  may  be  useful  even  when 
the  product  2rH2  can  be  stored,  but  is  too  costly  to  compute. 

The  linear  conjugate-gradient  method  (Hestenes  and  Stiefel,  1952)  is  an 
iterative  procedure  for  solving  the  linear  system 

ftp  =  -9,  (26) 

where  ft  is  symmetric  and  positive  definite,  without  explicitly  storing  the  matrix 
ft.  Rather,  a  sequence  of  iterates  {/><}  is  generated,  using  only  products  of  ft 
with  vectors.  The  vectors  {pt}  will  be  referred  to  as  Zznear  iterates,  and  the  exact 
solution  of  (26)  will  be  termed  the  Newton  direction.  The  vector  §  is  usually  the 
gradient  (or  projected  gradient)  of  some  nonlinear  function  4. 

Conjugate-gradient  methods  are  relevant  to  solving  (22)  because  the  product 
of  2tH2  and  a  vector  v  can  in  some  circumstances  be  computed  efficiently  even 
when  2tH2  is  not  available.  For  example,  if  ft  in  (26)  is  of  the  form  2rH2, 
where  2  is  given  by  (6)  and  H  is  sparse,  in  general  ft  will  be  a  dense  matrix. 
However,  if  H  can  be  retained  in  sparse  form,  and  2  and  ZT  can  be  applied  as 
noted  in  Section  2,  the  product  ftv  can  be  formed  efficiently. 

A  sparse  matrix  H  can  be  obtained  in  several  different  ways.  It  may  happen 
that  the  Hessian  of  the  Lagrangian  function  ( W )  is  sparse,  with  a  known  sparsity 
pattern.  (This  situation  is  less  likely  than  in  the  unconstrained  case,  because  the 
Hessians  of  all  the  active  constraints  as  well  as  the  objective  function  must  be 
sparse.)  In  this  case,  techniques  are  available  for  analysing  the  sparsity  pattern 
and  determining  special  finite-difference  vectors  that  permit  an  approximation 
to  W  to  be  computed  with  relatively  few  evaluations  of  the  relevant  gradients 
(see,  e  g .,  Curtis,  Powell  and  Reid,  1974;  Powell  and  Toiut,  1979).  Alternatively, 
a  sparse  quasi-Newton  approximation  to  W  (see,  e.g.,  Toint,  1977;  Dennis  ami 
Schnabel,  1978;  Shanno,  1979)  might  be  developed.  Although  our  experience 
with  sparse  quasi-Newton  updates  has  been  disappointing  even  in  the  uncon¬ 
strained  case  (see  Thapa,  1980),  any  improvements  in  such  methods  can  be  ap¬ 
plied  directly. 

If  the  Hessian  of  the  Lagrangian  function  is  not  sparse,  it  is  possible  to 
estimate  the  vector  W2v  by  a  finite-difference  along  the  vector  Is.  Obviously, 
this  computation  requires  additional  evaluations  of  the  problem  functions. 

A  conjugate  gradient  method  will  be  useful  in  solving  (22)  only  if  the  linear 
iterates  converge  rapidly;  by  assumption,  it  is  reasonable  to  compute  a  relative!? 
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until  number  of  matrix-vector  products  involving  ZrHZ .  Hence,  it  is  essential 
to  precondition  the  conjugate- gradient  method  (see,  e.g.,  Axelsson,  1977).  Let 
C  be  a  positive-definite  symmetric  matrix.  The  solution  of  (26)  can  be  found  by 
solving  the  system 

and  forming  p  =  C~ly.  Let  K  denote  the  matrix  C~lRC~l\  K  has  the  same 
eigenvalues  as  C~lR,  since  they  are  similar  matrices  (C~lKCl  =  C~XR). 
Since  the  linear  conjugate-gradient  method  is  known  to  converge  very  rapidly 
when  the  coefficient  matrix  has  clustered  eigenvalues,  the  preconditioning  matrix 
should  be  chosen  so  that  as  many  as  possible  of  the  eigenvalues  of  C~XR  are 
close  to  unity. 

When  the  projected  Hessian  is  small  enough  to  be  stored  explicitly,  precon¬ 
ditioning  allows  second-order  information  to  be  used  in  conjunction  with  a  quasi- 
Newton  method.  Thus,  if  a  quasi- Newton  approximation  of  is  maintained 
(e.g.,  as  RtR),  instead  of  computing  the  quasi-Newton  search  direction  from 
RTRpi  =  -f  as  in  (9),  we  could  solve 


R~TZTWZRrxy  =  -R~t§ 

by  the  conjugate-gradient  method,  and  then  take  the  Newton  direction  as  p  = 
ZR~xy.  The  preconditioning  matrix  may  be  modified  during,  or  after  the 
completion  of,  the  iterations  of  the  conjugate-gradient  method. 

The  truncated  Newton  method  of  Dembo  and  Steihaug  (1980)  "solves”  (26) 
by  performing  a  limited  number  of  iterations  of  the  linear  conjugate-gradient 
method.  The  final  iterate  of  the  truncated  sequence  is  then  taken  as  an  ap¬ 
proximate  solution  of  (26).  If  a  single  linear  iteration  is  used,  p  will  be  the 
steepest-desceut  direction  —§.  Thus,  the  truncated  Newton  algorithm  computes 
a  vector  that  interpolates  between  the  steepest-descent  direction  and  the  Newton 
direction. 

Dembo  and  Steihaug  show  that,  if  R  is  positive  definite  and  the  initial  iterate 
of  the  linear  conjugate-gradient  scheme  is  the  steepest-descent  direction  —  f,  all 
succeeding  linear  iterates  will  be  directions  of  descent  with  respect  to  ♦.  Gill, 
Murray  and  Nash  (1981)  show  how  to  generate  a  sequence  of  descent  directions 
for  the  case  when  R  may  be  indefinite. 

The  hope  with  a  truncated  Newton  method  is  to  reduce  the  required  num¬ 
ber  of  linear  conjugate-gradient  steps,  and  the  use  of  preconditioning  would 
therefore  seem  to  be  essential.  An  additional  benefit  can  also  be  produced  by  a 
preconditioning  strategy.  In  many  optimisation  methods,  the  search  direction  p 
is  computed  implicitly  or  explicitly  as 


P  =  -Wf, 
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where  M  it  a  positive-daflnite  matrix;  for  example,  limited-memory  quasi- Newton 
methods  define  hi  as  a  low-rank  modification  to  the  identity  matrix  (see  Shanno, 
1978).  If  the  matrix  M  is  used  to  precondition  R,  the  sector  —Mf  is  the  first 
member  of  the  linear  conjugate-gradient  sequence,  and  is  more  likely  to  give  a 
good  reduction  in  the  function  than  the  negative  gradient;  see  Gill,  Murray  and 
Nash  (1981)  for  further  details. 


I.  An  InequaHty  QP  Approach 

In  this  section,  we  briefly  mention  an  alternative  formulation  of  the  QP  sub¬ 
problem  —  as  an  inequality-constraint  QP  (IQP).  (Escudero,  1980,  also  discusses 
IQP  subproblems  for  large-scale  problems.)  In  this  case,  the  relational  operator 
associated  with  an  original  nonlinear  constraint  is  carried  over  to  the  subproblem 
(i.e.,  inequalities  in  the  original  problem  become  inequalities  in  the  subproblem), 
and  the  general  form  of  the  subproblem  is  given  by  (13). 

Because  an  IQP  subproblem  contains  inequalities,  it  must  be  solved  by 
an  iterative  QP  algorithm.  In  general  (assuming  that  all  the  variables  appear 
nonlinearly),  a  full  n  X  «  matrix  H  must  be  available,  since  it  will  not  be  known  a 
priori  which  set  of  constraints  will  hold  as  equalities  at  the  solution.  In  addition, 
a  "phase  P  procedure  will  typically  be  required  to  find  a  feasible  point  with 
respect  to  the  constraints. 

All  the  suggestions  made  concerning  an  EQP  subproblem  can  be  applied 
to  an  IQP  subproblem,  since  most  QP  algorithms  are  based  on  an  active  set 
strategy  (see  Cottle  and  Djang,  1979).  Note  that  the  two  approaches  differ  only 
when  more  than  one  iteration  is  needed  to  solve  the  IQP.  Therefore,  solving  an 
IQP  subproblem  is  always  more  work  than  solving  an  EQP  subproblem.  As  in 
the  algorithm  of  MINOS/ AUGMENTED,  it  seems  essential  to  limit  the  number 
of  iterations  to  be  performed  in  solving  the  subproblem. 

For  solving  a  large-scale  problem,  an  IQP  approach  could  be  implemented 
using  a  sparsity-exploiting  QP  method  to  solve  (27)  —  for  example,  Tomlin’s 
(1976)  implementation  of  Lemka’s  method  (Lemke,  1965).  Most  methods  of  this 
type  are  based  on  "pivoting”  operations  with  the  extended  matrix 


Thus,  there  is  a  need  is  to  develop  eflbctive  variants  of  these  methods  when  a 
sequence  of  IQP  subproblems  must  be  solved  that  are  related  in  the  special  ways 
noted  earlier.  In  particular,  only  the  last  few  rows  of  A  may  vary;  or  H  may  be 
modified  by  a  low-rank  matrix  if  certain  quasi-Newton  techniques  are  used  to 
approximate  the  Hessian  of  the  Lagrangian  function. 
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We  hare  indicated  tome  of  the  compromiies  necessary  to  implement  QP-based 
methods  for  large-scale  nonlinear ly  constrained  optimisation.  As  in  the  linear- 
constraint  case,  the  search  direction  can  no  longer  be  computed  with  “ideal* 
numerical  procedures.  Furthermore,  it  may  be  helpful  to  alter  the  formulation 
of  the  subproblem  in  the  interests  of  computational  efficiency. 

It  is  unclear  whether  the  superiority  of  QP-based  methods  in  the  dense 
case  will  carry  osar  to  large-scale  problems  with  bounds,  linear  and  nonlinear 
constraints.  The  alternatives  now  available  involve  a  higher-level  subproblem, 
and  may  be  less  flexible  in  adapting  the  subproblem  to  the  unpredictability  of 
nonlinear  constraints.  However,  they  benefit  from  the  ability  to  use  directly  the 
existing  codes  for  large-scale  linearly  constrained  optimisation.  Thus,  the  price 
paid  for  the  greater  flexibility  of  a  QP-based  method  is  a  considerable  increase 
in  programming  complexity,  and  a  reduced  ability  to  use  existing  software. 


References 

Axelsson,  O.  (1977).  “Solution  of  linear  systems  of  equations:  iterative  methods”, 
in  Sparse  Matrix  Technique!  (V.A.  Barber,  ed.),  Springer- Verlag  Lecture 
Notes  in  Mathematics  572,  Berlin,  Heidelberg  and  New  York. 

Biggs,  M.  C.  (1972).  “Constrained  minimisation  using  recursive  equality  quad¬ 
ratic  programming”,  in  Numerical  Method s  for  Non-linear  Optimisation  (F. 
A.  Lootsma,  ed.),  pp.  411-428,  Academic  Press,  London  and  New  York. 

Chamberlain,  R.  M.  (1979).  Some  examples  of  cycling  invariable  metric  methods 
for  constrained  minimisation,  Math.  Prog.  18,  pp.  378-383. 

Chamberlain,  R.  M.,  Lemartchal,  C.,  Pederson,  H.  C.,  and  Powell,  M.  J.  D. 
(1980).  The  watchdog  technique  for  forcing  convergence  in  algorithms  for 
constrained  optimisation,  Report  DAMTP  80/NA  1,  University  of  Cam¬ 
bridge. 

Cottle,  R.  W.  (1974).  Manifestations  of  the  Schur  complement,  Linear  Alg. 
Appiica.  8,  pp.  189-211. 

Cottle,  R.  W.  and  Djaag,  A.  (1979).  Algorithmic  equivalence  in  quadratic 
programming,  I:  a  least  distance  programming  problem,  J.  Opt.  Th.  Applies. 
28,  pp.  275-301. 

Curtis,  A.  R.,  Powell,  M.  J,  D.  and  Rekl,  J.  K.  (1974).  On  the  estimation  of 
sparse  Jacobian  matrices,  J.  hut.  Maths.  Applies.  18,  pp.  117-119. 


20 


Dautiig,  G.  B.  (1963).  Linear  Programming  and  Extensions,  Princeton  University 
Pron,  Princeton,  Now  Jerse y. 

Dombo,  R.  8.  and  Steihaag  T.  (1900).  Truncated- Newton  algorithms  for  large- 
scalo  nnconstraiaad  optimisation,  Working  paper  #40,  School  of  Organisation 
and  Management,  Yale  University. 

Dennis,  J.  E.,  Jr.  (1971).  "Toward  a  unified  convergence  theory  for  Newton-like 
methods",  in  Nontimeor  Functional  Analysis  and  Applications  (L.  B.  Rail, 
ed.),  pp.  425-472,  Academic  Press,  New  York. 

Dennis,  J.  E.,  Jr.  and  Schnabel,  R.  B.  (1975).  Least-change  secant  updates  for 
quasi-Newton  methods,  Technical  Repent  78-344,  Department  of  Computer 
Science,  Cornell  University. 

Eseudero,  L.  (1980).  A  projected  Lagrangiaa  method  for  nonlinear  programming, 
Report  G320-3401,  IBM  Palo  Alto  Scientific  Center. 

Faddeeva,  V.  N.  (1959).  Computational  Methods  of  Linear  Algebra,  Dover 
Publications,  New  York. 

Fiacco,  A.  V.  and  McCormick,  G.  P.  (1968).  Nonlinear  Programming:  Sequential 
Unconstrained  Minimisation  Techniques,  John  Wiley  and  Sons,  New  York. 

Fletcher,  R.  (1974).  "Methods  related  to  Lagrangian  functions”,  in  Numerical 
Methods  for  Constrained  Optimisation  (P.  E.  Gill  and  W.  Murray,  eds.), 
pp.  219-240,  Academic  Press,  London  and  New  York. 

Forrest,  J.  J.  H.  and  Tomlin,  J.  A.  (1972).  Updating  triangular  factors  of  the 
basis  to  maintain  sparsity  in  the  product  form  simplex  method,  Math.  Prog. 

2,  pp.  263-278. 

Garda-Palomares,  U.  M.  and  Mangasarian,  O.  L.  (1976).  Super  linearly  conver¬ 
gent  quasi-Newton  algorithms  for  nonlinearly  constrained  optimisation  prob¬ 
lems,  Math.  Prog.  11,  pp.  1-13. 

Gill,  P.  E.  and  Murray,  W.  (1974).  Newton-type  methods  for  unconstrained  and 
linearly  constrained  optimisation,  Math.  Prog.  28,  pp.  311-350. 

GUI,  P.  E.  and  Murray,  W.  (1979a).  The  computation  of  Lagrange  multiplier 
estimates  for  constrained  minimisation,  Math.  Prog.  17,  pp  32-00. 

GUI,  P.  B.  and  Murray,  W.  (1979b).  Conjugate-gradient  methods  for  large- 
scale  nonlinear  optimisation,  Report  SOL  79-15,  Department  of  Operations 
Research,  Stanford  University. 

GUI,  P.  E.,  Murray,  W.  and  Nash,  S.  G.  (1981).  Newton-type  minimisation 


methods  using  the  linear  conjugate- gradient  method,  Report  (to  appear). 
Department  of  Operations  Research,  Stanford  University,  California. 

Han,  S.-P.  (1976).  Superlinearly  convergent  variable  metric  algorithms  for  gen¬ 
eral  nonlinear  programming  problems,  Math.  Prog.  11,  pp.  263-282. 

Han,  S.-P.  (1977).  A  globally  convergent  method  for  nonlinear  programming,  J. 
Opt.  Th.  Applies.  22,  pp.  297-310. 

Hestenes,  M.  R.  and  Stiefel,  E.  (1952).  Methods  of  conjugate  gradients  for  solving 
linear  systems,  J.  Res.  Net.  Bur.  Standards  49,  pp.  409-436. 

Hillstrom,  K.  E.  (1977).  A  simulation  test  approach  to  the  evaluation  of  nonlinear 
optimisation  algorithms,  ACM  Trans.  Math.  Software  3,  pp.  305-315. 

Lemke,  C.  E.  (1965).  Bimatrix  equilibrium  points  and  mathematical  program¬ 
ming,  Management  Science  11,  pp.  681-689. 

Lyness,  J.  N.  and  Greenweil,  C.  (1977).  A  pilot  scheme  for  minimisation  software 
evaluation,  Tech.  Memo.  323,  Argonne  National  Laboratory,  Argonne,  Illinois. 

Maratos,  N.  (1978).  Exact  Penalty  Function  Algorithms  for  Finite-Dimensional 
and  Control  Optimisation  Problems,  Ph.  D.  Thesis,  University  of  London. 

Murray,  W.  (1969).  "An  algorithm  for  constrained  minimisation”,  in  Optim¬ 
isation  (R.  Fletcher,  ed.),  pp.  247-258,  Academic  Press,  London  and  New 
York. 

Murray,  W.  and  Wright,  M.  H.  (1980).  Computation  of  the  search  direction 
in  constrained  optimisation  algorithms,  Report  SOL  80-2,  Department  of 
Operations  Research,  Stanford  University,  to  appear  in  Math.  Prog.  Study 
on  Constrained  Optimisation. 

Murtagh,  B.  A.  and  Saunders,  M.  A.  (1977).  MINOS  User’s  Guide,  Report  SOL 
77-9,  Department  of  Operations  Research,  Stanford  University. 

Murtagh,  B.  A.  and  Saunders,  M.  A.  (1978).  Large-scale  linearly  constrained 
optimisation,  Math.  Prog.  14,  pp.  41-72. 

Murtagh,  B.  A.  and  Saunders,  M.  A.  (1980a).  The  implementation  of  a  Lagrangian- 
based  algorithm  for  sparse  nonlinear  constraints,  Report  SOL  80-1,  Depart¬ 
ment  of  Operations  Research,  Stanford  University,  to  appear  in  Math.  Prog. 
Study  on  Constrained  Optimisation. 

Murtagh,  B.  A.  and  Saunders,  M.  A.  (1980b).  MINOS/AUGMENTBD  User’s 
Manual,  Report  SOL  80-14,  Department  of  Operations  Research,  Stanford 
University. 


22  QP-Bind  Mrttodi  for  Lvf-8cok  Optkokatba 

Orchard-Hays,  W.  (1968).  Advanced  Linear-Programming  Computing  Techniques, 
McGraw-Hill,  New  York. 

Ortega,  J.  M.  and  Rheinboldt,  W.  C.  (1970).  Iterative  Solution  of  Nonlinear 
Equations  in  Several  Variables,  Academic  Press,  London  and  New  York. 

Powell,  M.  J.  D.  (1977).  A  fait  algorithm  for  nonli nearly  constrained  optimisation 
calculations,  Report  DAMTP  77/NA  2,  University  of  Cambridge. 

Powell,  M.  J.  D.  (1978).  "The  convergence  of  variable  metric  methods  for  non- 
linearly  constrained  optimisation  calculations”,  in  Nonlinear  Programming 
3  (O.  L.  Mangasarian,  R.  R.  Meyer,  and  S.  M.  Robinson,  eds.),  pp.  27-63, 
Academic  Press,  London  and  New  York. 

Powell,  M.  J.  D.  and  Toint,  Ph.  L.  (1979).  On  the  estimation  of  sparse  Hessian 
matrices,  SIAM  J.  Numer.  Anal.  16,  pp.  1060-1074. 

Reid,  J.  K.  (1976).  Fortran  subroutines  for  handling  sparse  linear  programming 
bases,  Report  R8269,  Atomic  Energy  Research  Establishment,  Harwell, 
England. 

Robinson,  S.  M.  (1972).  A  quadratically  convergent  algorithm  for  general  non¬ 
linear  programming  problems.  Math.  Prog.  6,  pp.  145-156. 

Robinson,  8.  M.  (1974).  Perturbed  Kuhn-Tucksr  points  and  rates  of  convergence 
for  a  class  of  nonlinear  programming  problems,  Math.  Prog.  7,  pp.  1-16. 

Rosen,  J.  B.  (1978).  "Two-phase  algorithm  for  nonlinear  constraint  problems”, 
in  Nonlinear  Programming  3,  (O.  L.  Mangasarian,  R.  R.  Meyer,  and  S.  M. 
Robinson,  eds.),  pp.  97-124,  Academic  Press,  London  and  New  York. 

Rosen,  J.  B.  and  Kreuser,  J.  (1972).  "A  gradient  projection  algorithm  for  non¬ 
linear  constraints”,  in  Numerical  Methods  for  Non-linear  Optimisation,  (F. 
A.  Lootsma,  ed.),  pp.  297-300,  Academic  Press,  London  and  New  York. 

Saunders,  M.  A.  (1976).  A  fast,  stable  implementation  of  the  simplex  method 
using  Bartels-Golub  updating,  in  Sparse  Matrix  Computations,  (J.  R.  Bunch 
and  D.  I.  Rose,  eds.),  pp.  213-226,  Academic  Press,  London  and  New  York. 

Shanno,  D.  F.  (1978).  Conjugate  gradient  methods  with  inexact  searches,  Math. 
of  Oper.  Res.  3,  pp.  244-256. 

Shanno,  D.  F.  (1979).  Computational  experience  with  methods  for  estimating 
sparse  Hessians  for  nonlinear  optimisation,  Report  MIS  79-8,  School  of 
Management  and  Information  Science,  University  of  Arisona. 

Tapia,  R.  A.  (1978).  "Quasi-Newton  methods  for  equality  constrained  optim- 


as 


fiafera m 

isation:  equivalence  of  existing  methods  and  a  new  implementation” ,  in 
Nonlinear  Programming  3,  (O.  L.  Mangasarian,  R.  R.  Meyer,  and  S.  M. 
Robinson,  eds.),  pp.  125-164,  Academic  Press,  London  and  New  York. 

Thapa,  M.  N.  (1080).  Optimisation  of  l/nconstrained  Functions  with  Sparse 
Hessian  Matrices,  Ph.  D.  Thesis,  Stanford  University. 

Toint,  P.  L.  (1077).  On  sparse  and  symmetric  matrix  updating  subject  to  a  linear 
equation.  Math.  Comp.  81,  pp.  054-061. 

Tomlin,  J.  A.  (1076).  Robust  implementation  of  Lemke’s  method  for  the  linear 
complementarity  problem,  Report  SOL  76-24,  Department  of  Operations 
Research,  Stanford  University. 

Van  der  Hoek,  G.  (1070).  Asymptotic  properties  of  reduction  methods  applying 
linearly  equality  constrained  reduced  problems,  Report  7033,  Econometric 
Institute,  Erasmus  University,  Rotterdam. 

Wilkinson,  J.  H.  (1065).  The  Algebraic  Eigenvalue  Problem,  The  Clarendon 
Press,  Oxford. 

Wilson,  R.  B.  (1063).  A  SimpHcial  Algorithm  for  Concave  Programming,  Ph.  D. 
Thesis,  Harvard  University. 

Wolfe,  P.  (1062).  The  reduced  gradient  method,  unpublished  manuscript,  The 
RAND  Corporation. 

Wright,  M.  H.  (1076).  Numerical  Methods  for  Nonlinearly  Constrained  Optim¬ 
isation,  Ph.  D.  Thesis,  Stanford  University. 


■ 


UNCLASSIFIED 


MCUMTY  CLARIFICATION  OF  TNH  »Am 


DOCUMeNTATNM  PACE 


SOL  81-1  ✓ 


4.  TITLE  (M*  MMn 


QP-Based  Methods  for  Large-Scale  Nonllnearly 
Constrained  Optimization 


I  CATALOG  NUMBER 


».  Tyre  or  REPORT  A  PC  MOO  COVERED 


Technical  Report 


«.  PERFORMING  ORO.  REPORT  NUMBER 


Philip  E.  Gill,  Mai  ter  Murray, 

Michael  A.  Saunders,  and  Margaret  H.  Wright 


PERPORMINB  ORGANIZATION  MAMS  AMO  ADOREM 

Department  of  Operations  Research  -  SOL 
Stanford  University 
Stanford,  CA  94305 


AMT  HUMBERT*) 

DAAG29-79-C-01 1 0  ^ 
N0001 4-75-C-0267 


•  I.  CONTROLLING  OFFICE  NAME  AND  ADOREM 

U.S.  Army  Research  Office 
P.0.  Box  12211 

Research  Triangle  Park,  NC  27709 


NAME  I  ADOREMfff  AHmnI  trwm  ChMIIIM  OMImm) 

Office  of  Naval  Research 
Dept,  of  the  Navy 
800  N.  Quincy  Street 
Arlington,  VA  22217 


M.  DISTRIBUTION  STATEMENT  (ml  Ml* 

This  document  has  been  approved  for  public  release  and  sale; 
Its  distribution  Is  unlimited. 


It.  REPORT  DATE 

January  1981 


•  NUMBER  OF  PAOES 

23 


trwm  CmMWM  OMIe*)  IS.  SECURITY  CLASS,  (ml  Ml*  i 

UNCLASSIFIED 


IT.  DISTRIBUTION  STATEMENT  (ml  M*  i 


I  N  SUM  N,  II  MMfmnr  I 


It.  REV  SORM  (CmmHmmm  mm  imm  mMm  It  — — — F 

large-scale  optimization  sparse  Hessian 

QP-based  methods  sparse  linear  and  nonlinear  constraints 

nonllnearly  constrained  optimization  numerical  software 
Lagranglan  methods 


ABSTRACT  (CmUmmm  mm  mm  mMm  M  mmmmmmmn  «"F  |RM»  Ap  MM  mmmOmi) 

Several  methods  fbr  nonllnearly  constrained  optimization  have  been  suggested 
In  recent  years  that  are  based  on  solving  a  quadratic  programming  (QP)  sub¬ 
problem  to  determine  the  direction  of  search.  Even  for  dense  problems,  there 
Is  no  consensus  at  present  concerning  the  “best"  formulation  of  the  QP  sub¬ 
problem.  When  solving  large  problems,  many  of  the  options  possible  for  small 
problems  become  unreasonably  expensive  In  terms  of  storage  and/or  arithmetic 
operations.  This  paper  discusses  the  Inherent  difficulties  of  developing 
QP-based  methods  for  large-scale  nonllnearly  constrained  optimization,  and 
suggests  some  possible  approaches. 


